Wandering Aengus
VAST 2009 Challenge
Challenge 1: Badge and Network
Traffic
Authors and
Affiliations:
David
G. Robinson, Sandia National Laboratories, drobin@sandia.gov
Tool: FishFinder – shares a
heritage with the GibbsLDA software. FishFinder is a
new tool being developed for another research effort and I was curious if it
could be used in this application.
------------------------------------------------------------------------
MC1.1:
Identify which computer(s) the employee most likely used to send information to
his contact in a tab-delimited table which contains for each computer
identified: when the information was sent, how much information was sent and
where that information was sent. A total of 11 suspect static IP addresses were
initially identified as having unique characteristics (see figure below) and
this was narrowed down to three.
Only the top three are listed in the table: Traffic.txt
------------------------------------------------------------------------
*MC1.2: Characterize the patterns of behavior
of suspicious computer use.*
Time constraints limited
the analysis and these results represent a first cut. The IPLog3.5.csv data set was modified to simplify this
initial exploration. Specifically,
the time of day was reduced to a 24 hour clock and calendar dates were changed
to Day of Week. A variation of probabilistic latent semantic analysis was used
to identify major patterns within the computer usage. Time constraints prevented a full similarity analysis using,
e.g., a Kullback-Leibler divergence measure. However, a quick look was
accomplished using a variation of a Probability by Surprisal measure to compare
the cluster distribution functions. The figure below presents the results of
the Probability by Surprisal analysis used to identify the initial 11. The final three were selected as having
unique Destination IP addresses.